Skip to content

feat: kaikki fallback + definition code cleanup#138

Merged
Hugo0 merged 2 commits intomainfrom
feat/llm-definition-system
Mar 4, 2026
Merged

feat: kaikki fallback + definition code cleanup#138
Hugo0 merged 2 commits intomainfrom
feat/llm-definition-system

Conversation

@Hugo0
Copy link
Owner

@Hugo0 Hugo0 commented Mar 4, 2026

Summary

Follow-up to #137. Adds kaikki (offline Wiktionary data) as a fallback tier when GPT-5.2 returns no definition, and cleans up code quality.

Changes

  • Kaikki fallback: When LLM returns null (low confidence / unrecognized word), falls back to kaikki native → kaikki English before giving up. Covers ~60% of Dutch and ~52% of Romanian words where LLM struggles with rare vocabulary.
  • DRY: Merged duplicate lookup_kaikki_native/lookup_kaikki_english into shared _lookup_kaikki()
  • Cleanup: Removed unused strip_html() and import re (deprecated parser has its own copy)
  • Fix: Kaikki results now include Wiktionary URLs (were returning None)
  • Fix: Docstring updated from "2-tier" to "3-tier"

Definition flow

1. Disk cache (pre-generated)
2. LLM (GPT-5.2) — structured JSON with confidence scoring
3. Kaikki native (offline, 13 major languages)
4. Kaikki English (offline, 50+ languages)
5. None → frontend hides card

Test plan

  • 25 definition tests pass (5 new for kaikki fallback)
  • Kaikki fallback results are cached to disk
  • Negative cache only written when all tiers fail

@coderabbitai full review

Summary by CodeRabbit

  • New Features

    • Added automatic fallback to offline sources when primary definition lookup fails
    • Implemented disk-based caching for improved lookup performance
    • Enhanced multi-language definition support
  • Bug Fixes

    • Improved overall definition retrieval reliability with cascading data sources

Hugo0 added 2 commits March 4, 2026 20:54
Flow is now: disk cache → LLM (GPT-5.2) → kaikki native → kaikki English.
Covers ~60% of Dutch and ~52% of Romanian words where LLM has low confidence.
@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This change introduces a 3-tier definition lookup system (disk cache → LLM → offline Kaikki data) that unifies Kaikki lookup handling with native and English variants, replacing the previous 2-tier flow. Comprehensive unit tests validate fallback behavior, caching, and negative caching scenarios.

Changes

Cohort / File(s) Summary
Core Implementation
webapp/definitions.py
Introduces 3-tier definition lookup with unified _lookup_kaikki() handling both native and English variants. Adds centralized Wiktionary URL construction with language mappings. Refactors LLM response handling to extract and trim def_native and def_en fields. Integrates Kaikki fallback after LLM attempts fail, maintaining negative caching behavior.
Test Coverage
tests/test_definitions.py
Adds 63 lines of test cases covering Kaikki fallback when LLM returns None, English fallback when native data missing, disk caching of fallback results, and negative cache behavior with TTL and skip flags.

Sequence Diagram

sequenceDiagram
    participant Caller as fetch_definition()
    participant Cache as Disk Cache
    participant LLM as LLM (GPT-5.2)
    participant Kaikki as Kaikki Offline Data
    
    Caller->>Cache: Check disk cache
    alt Cache hit
        Cache-->>Caller: Return cached definition
    else Cache miss
        Caller->>LLM: Request definition
        alt LLM returns data
            LLM-->>Caller: Definition with def_native, def_en
            Caller->>Cache: Store positive cache
            Cache-->>Caller: Cached result
        else LLM returns None
            Caller->>Kaikki: Fallback lookup_kaikki_native()
            alt Native Kaikki found
                Kaikki-->>Caller: Native definition
                Caller->>Cache: Store positive cache
            else Native Kaikki not found
                Caller->>Kaikki: Fallback lookup_kaikki_english()
                alt English Kaikki found
                    Kaikki-->>Caller: English definition
                    Caller->>Cache: Store positive cache
                else All fallbacks failed
                    Caller->>Cache: Store negative cache
                end
            end
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 Bouncing through the cache with glee,
Three tiers deep where definitions be,
LLM first, then Kaikki's call,
Fallback gracefully through them all!
Tests ensure our logic's sound,
Definitions safely cached and found. 🗂️

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding Kaikki fallback functionality and refactoring definition lookup code. It is specific, concise, and clearly communicates the primary objectives of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/llm-definition-system

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Hugo0 Hugo0 merged commit b19ef79 into main Mar 4, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant